Emergent unsupervised clustering paradigms with potential application to bioinformatics.
نویسندگان
چکیده
In recent years, there has been a great upsurge in the application of data clustering, statistical classification, and related machine learning techniques to the field of molecular biology, in particular analysis of DNA microarray expression data. Clustering methods can be used to group co-expressed genes, shedding light on gene function and co-regulation. Alternatively, they can group samples or conditions to identify phenotypical groups, disease subgroups, or to help identify disease pathways. A rich variety of unsupervised techniques have been applied, including partitional, hierarchical, graph-based, model-based, and biclustering methods. While a number of machine learning problems and tools have found mainstream applications in bioinformatics, in this article we identify some challenging problems which, though clearly relevant to bioinformatics, have not been extensively investigated in this domain. These include i) unsupervised clustering with unsupervised feature selection, ii) semisupervised learning, iii) unsupervised learning (and supervised learning) in the presence of confounding variables, and iv) stability of clustering solutions. We review recent methods which address these problems and take the position that these methods are well-suited to addressing some common scenarios that occur in bioinformatics.
منابع مشابه
Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملUnsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data
Clustering has become an integral part of microarray data analysis and interpretation. The algorithmic basis of clustering -- the application of unsupervised machine-learning techniques to identify the patterns inherent in a data set -- is well established. This review discusses the biological motivations for and applications of these techniques to integrating gene expression data with other bi...
متن کاملAll Health Partnerships, Great and Small: Comparing Mandated With Emergent Health Partnerships; Comment on “Evaluating Global Health Partnerships: A Case Study of a Gavi HPV Vaccine Application Process in Uganda”
The plurality of healthcare providers and funders in low- and middle-income countries (LMICs) has given rise to an era in which health partnerships are becoming the norm in international development. Whether mandated or emergent, three common drivers are essential for ensuring successful health partnerships: trust; a diverse and inclusive network; and a clear governance structure. Mandated and ...
متن کاملCharacterization of Linkage-based Clustering
Clustering is a central unsupervised learning task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, unlike classification tasks, in clustering, different algorithms may yield dramatically different outputs for the same input sets. A major challenge is to develop tools that may help select the more suitable algorithm for a given clustering t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Frontiers in bioscience : a journal and virtual library
دوره 13 شماره
صفحات -
تاریخ انتشار 2008